PTMs and half-lives

New names:
• `CHX_0` -> `CHX_0...3`
• `CHX_0` -> `CHX_0...4`
• `CHX_0` -> `CHX_0...5`
• `CHX_0` -> `CHX_0...6`
• `CHX_1` -> `CHX_1...7`
• `CHX_1` -> `CHX_1...8`
• `CHX_2` -> `CHX_2...9`
• `CHX_2` -> `CHX_2...10`
• `CHX_4` -> `CHX_4...11`
• `CHX_4` -> `CHX_4...12`
• `CHX_6` -> `CHX_6...13`
• `CHX_6` -> `CHX_6...14`
• `CHX_8` -> `CHX_8...15`
• `CHX_8` -> `CHX_8...16`
• `CHX_8` -> `CHX_8...17`
• `CHX_8` -> `CHX_8...18`

Proteins with a short half-life

Proteins can have varying half-lives

  • What do the half-lives depend on?

  • How are they measured?

Below is a comparison of the distribution of the half-lives that was found in literature and the distribution of a subset of those half-lives in the proteins found in the dataset.

Though the mean half-life of these proteins is higher than the mean half-life of the whole dataset, none of the outlier proteins can be classified as long-lived proteins. Proteins can be classified as long-lived when their mean half-life exceeds 48 hours (ref), though this is an arbitrary definition.

The enriched proteins in the peak. These are proteins with half-lives 20-25 hours:

    LeadProt
1     Q15788
2     Q8WUH6
3     P38606
4     O95619
5     Q9C0D3
6     P63208
7     Q15643
8     P31629
9     Q9BV73
10    O94986
11    Q92769
12    Q15056
13    Q96MF7
14    P36954
15    Q9UQ35
16    Q13049
17    Q86U90
18    Q96G01
19    Q9H410
20    Q0VGL1
21    Q13615
22    Q96JN8
23    P49790
24    O60216
25    Q15036
26    Q9UMZ2
27    P29374
28    O95400
29    Q9P2M7
30    Q9ULV3
31    Q16206
32    Q9H3H1
33    Q8WXI9
34    O00560
35    Q9BZF9
36    Q9HCK1
37    O60315
38    Q8N1G0
39    Q96F63
40    P42771
41    Q9P2D1
42    Q7Z7K0
43    Q66GS9
44    P16220
45    Q96FZ2
46    P31321
47    O60303
48    Q9P2P6
49    Q6P0Q8
50    Q86YC2
51    Q9HBE1
52    Q9BSM1
53    O15212
54    O60671
55    Q02833
56    P28749
57    P62487
58    Q2M3G4
59    Q9H0K1
60    Q96BN2
61    P51965
62    P00374
63    O15350
64    O00459
65    P28289
66    Q9HBM1
67    O15164
68    O43303
69    Q14159
70    Q9H0A8
71    Q69YQ0
72    Q96JK2
73    O00165
74    Q9H0F6
75    Q9UQR1
76    Q9ULW3
77    Q4LE39
78    Q9HCU9
79    P51946
80    Q8IVH2
81    Q8N5I9
82    Q9H0Z9
83    O75528
84    Q16594
85    Q00537
86    Q9Y592
87    Q9H9F9
88    Q9BTT4
89    Q08999
90    Q9UPW6
91    Q14186
92    Q9UI30
93    Q8N8D1
94    P10071
95    Q8NHQ8
96    P42568
97    Q96IK1
98    Q9Y6R9
99    Q86X02
100   Q9P0K8
101   Q8N5Y2
102   Q9NS91
103   Q9BQ65
104   Q8IYN0
105   Q8TC92
106   Q96H20
107   P55789
108   Q8ND83
109   Q9C037
110   P36543
111   Q6PIY7
112   Q7L273
113   O00463
114   Q96JM7
115   Q9BQ15
116   Q6PII3
117   Q9NYR9
118   Q13487
119   O43167
120   O43739
121   Q9NPJ6
122   Q9H999
123   Q8N300
124   O96006
125   Q9BRR0
126   O15131
127   Q16342
128   Q96Q83

The proteins in that section are transcription factors. This was done accroding to the DAVID (UNIPROT_ACCESSION)

PTMs

Using genes from GenAge is ligit. Can continue doing that.

Prediction and characterization of human ageing-related proteins by using machine learning | Scientific Reports (nature.com)

PTMs of interest:

  • PTMs that control autophagy

    • phosphorylation

    • ubiquitination

    • acetylation

  • oxPTMs

    • you have a list of these
  • Methylation eg of histones

Phosphorylation

  • Only the modification [21]Phospho is present here.

Splitting the dataset in a group with phosphorylation proteins and another group with all remaining proteins.

What are the phosphorylated proteins in that peak?

Got the list of genes that are associated with ageing from GenAge

Distribution of phosphorylated ageing proteins vs phosphorylated non-ageing proteins vs non-phosphorylated proteins vs non-phosphorylated ageing proteins.

Methylation

  • Filtered by the [34]Methyl modification

Acetylation

  • Filtered by the [1]Acetyl modification

Oxidation

  • Only modification: [35]Oxidation

oxPTMs

All PTMs related to oxidative damage in general, not only [35]Oxidation.

As can be seen in the above graphs, the presence of the small peak, which represents proteins with very short half-lives, varies depending on the type of modification. It is more prominent in PTMs that are related to ageing.

Line graph

Hypothesis: proteins with higher mean half-lives remain in the cell for longer, therefore they are more susceptible to oxidative damage and will accumulate more oxPTMs.

Approach:

  • Count the total number of oxPTMs on each protein (using normalised counts)

  • Plot the mean half-life of the proteins vs their total oxPTMs count

Below is a plot that shows how the number of oxPTMs changes with the mean half-life of different proteins. The proteins, which were identified to have a very short half-life are shown in blue. A similar pattern is seen between the blue and red points, where some have a very high number of modifications, while most show very low abundances. Most proteins still seem to remain largely unmodified irrespective of their half-life.

Below we will look at the sum of total counts for each modification type

The scatter plots are more interesting if you also include IUPred scores or localisation etc.

phosphorylation

Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
  always returns an ungrouped data frame and adjust accordingly.
`summarise()` has grouped output by 'LeadProt'. You can override using the
`.groups` argument.
Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
  always returns an ungrouped data frame and adjust accordingly.
`summarise()` has grouped output by 'LeadProt'. You can override using the
`.groups` argument.
Warning in mean.default(pho_age_prot$sum_mod_count): argument is not numeric or
logical: returning NA
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_hline()`).

oxidation

Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
  always returns an ungrouped data frame and adjust accordingly.
`summarise()` has grouped output by 'LeadProt'. You can override using the
`.groups` argument.
Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
  always returns an ungrouped data frame and adjust accordingly.
`summarise()` has grouped output by 'LeadProt'. You can override using the
`.groups` argument.
Warning in mean.default(pho_age_prot$sum_mod_count): argument is not numeric or
logical: returning NA
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_hline()`).

acetyl

Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
  always returns an ungrouped data frame and adjust accordingly.
`summarise()` has grouped output by 'LeadProt'. You can override using the
`.groups` argument.
Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
  always returns an ungrouped data frame and adjust accordingly.
`summarise()` has grouped output by 'LeadProt'. You can override using the
`.groups` argument.

oxPTMs

Bar chart

Hypothesis: The higher the half-life, the greater the number of PTMs.

Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
  always returns an ungrouped data frame and adjust accordingly.
`summarise()` has grouped output by 'hl_group', 'mod_group'. You can override
using the `.groups` argument.

Proteins with a long half-life

Long-lived proteins can be used as estimators of chronological age. Long-lived proteins can be defined in different ways, for example based on the half-life of the protein when compared to the average half-life of proteins in the organism. In this case, long-lived proteins were obtained from the following study: paper. Proteins were classified as long-lived based on their degree of degradation during the experiment and therefore it was possible to discover new long-lived proteins (no a priori assumptions were made).

The study identified a list of long-lived proteins in rats, therefore human orthologs of these proteins were found.

PTMs

Phosphorylation

Methylation

Warning: Removed 494739 rows containing non-finite outside the scale range
(`stat_density()`).
Warning: Removed 44297 rows containing non-finite outside the scale range
(`stat_density()`).
Warning: Removed 29244 rows containing non-finite outside the scale range
(`stat_density()`).
Warning: Removed 3624 rows containing non-finite outside the scale range
(`stat_density()`).

Acetylation

Warning: Removed 525274 rows containing non-finite outside the scale range
(`stat_density()`).
Warning: Removed 16410 rows containing non-finite outside the scale range
(`stat_density()`).
Warning: Removed 31892 rows containing non-finite outside the scale range
(`stat_density()`).
Warning: Removed 976 rows containing non-finite outside the scale range
(`stat_density()`).

Oxidation

Warning: Removed 496997 rows containing non-finite outside the scale range
(`stat_density()`).
Warning: Removed 42625 rows containing non-finite outside the scale range
(`stat_density()`).
Warning: Removed 29830 rows containing non-finite outside the scale range
(`stat_density()`).
Warning: Removed 3038 rows containing non-finite outside the scale range
(`stat_density()`).

oxPTMs

Warning: Removed 442146 rows containing non-finite outside the scale range
(`stat_density()`).
Warning: Removed 94211 rows containing non-finite outside the scale range
(`stat_density()`).
Warning: Removed 26565 rows containing non-finite outside the scale range
(`stat_density()`).
Warning: Removed 6303 rows containing non-finite outside the scale range
(`stat_density()`).

B cells

cytoplasmic projects:

  • old B cells: PXD006570

  • young B cells:PXD006572

nuclear proteins:

  • old B cells:PXD006571

  • young B cells:PXD006576

Distribution of the half-lives

Take the cytoplasmic proteins and plot 2 density plots. Compare the distributions of the half-lives between the

PTMs

Phosphorylation

For cytoplasmic proteins what are the differences between old and young proteins?

human_ptms_cyto_old_hl_pho <- human_ptms_cyto_old_hl %>% filter(unimod_id == 21) 
human_ptms_cyto_old_hl_no_pho <- human_ptms_cyto_old_hl %>% filter(!unimod_id == 21) 
human_ptms_cyto_young_hl_no_pho <- human_ptms_cyto_young_hl %>% filter(!unimod_id == 21) 
human_ptms_cyto_young_hl_pho <- human_ptms_cyto_young_hl %>% filter(unimod_id == 21) 

human_ptms_nuc_old_hl_pho <- human_ptms_nuc_old_hl %>% filter(unimod_id == 21) 
human_ptms_nuc_old_hl_no_pho <- human_ptms_nuc_old_hl %>% filter(!unimod_id == 21) 
human_ptms_nuc_young_hl_pho <-  human_ptms_nuc_young_hl %>% filter(unimod_id == 21) 
human_ptms_nuc_young_hl_no_pho <-  human_ptms_nuc_young_hl %>% filter(!unimod_id == 21) 

ggplot() +
  geom_density(data = human_ptms_cyto_old_hl_pho, aes(x = mean_hl_hours, weight = norm_counts, fill = 'human_ptms_cyto_old_hl_pho'), alpha = 0.7, bw = 2) +
  geom_density(data = human_ptms_cyto_old_hl_no_pho, aes(x = mean_hl_hours, weight = norm_counts,fill = 'human_ptms_cyto_old_hl_no_pho'), alpha = 0.7, bw = 2) +
  geom_density(data = human_ptms_cyto_young_hl_no_pho, aes(x = mean_hl_hours, weight = norm_counts,fill = 'human_ptms_cyto_young_hl_no_pho'), alpha = 0.7, bw = 2) +
   geom_density(data = human_ptms_cyto_young_hl_pho, aes(x = mean_hl_hours, weight = norm_counts,fill = 'human_ptms_cyto_young_hl_pho'), alpha = 0.7, bw = 2) +
  labs(x = 'Mean half-lives (hours)', y = 'Density') +
  scale_x_continuous(limits = c(0,100)) +
  theme_classic() +
  scale_fill_manual(values = c('human_ptms_cyto_old_hl_pho' = '#EA7317', 'human_ptms_cyto_old_hl_no_pho' = '#FFB703','human_ptms_cyto_young_hl_no_pho'  = "#5DB7B1",'human_ptms_cyto_young_hl_pho' = '#3DA5D9'), name = 'Legend')  # Manually specify fill colors
Warning: Removed 51 rows containing non-finite outside the scale range
(`stat_density()`).
Warning: Removed 505 rows containing non-finite outside the scale range
(`stat_density()`).
Warning: Removed 398 rows containing non-finite outside the scale range
(`stat_density()`).
Warning: Removed 30 rows containing non-finite outside the scale range
(`stat_density()`).

Methylation

Acetylation

Oxidation

oxPTMs

Approach:

  • Get a list of PTMs that correlate with ageing such as oxPTMs, acetylation etc.

  • Test whether the abundance of these PTMs changes between the long-lived proteins and the normal proteins.

Bar chart

Is there less phosphorylation in young cells compared to old cells?

Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
  always returns an ungrouped data frame and adjust accordingly.
`summarise()` has grouped output by 'hl_group', 'age_group'. You can override
using the `.groups` argument.

Acetylation

df_1 <- human_ptms_nuc_old_hl %>% mutate(age_group = 'nuc_old') %>% filter(unimod_id == 1)
df_2 <- human_ptms_nuc_young_hl %>% mutate(age_group = 'nuc_young') %>% filter(unimod_id == 1)
df_3 <- human_ptms_cyto_old_hl %>% mutate(age_group = 'cyto_old') %>% filter(unimod_id == 1)
df_4 <- human_ptms_cyto_young_hl %>% mutate(age_group = 'cyto_young') %>% filter(unimod_id == 1)

df <- rbind(df_1, df_2, df_3, df_4)
df <- df %>%
  mutate(hl_group = case_when(
    mean_hl_hours <= 50 ~ "0-50",
    mean_hl_hours <= 100 ~ "50-100",
    mean_hl_hours <= 150 ~ "100-150",
    mean_hl_hours <= 200 ~ "150-200",
    mean_hl_hours <= 250 ~ "200-250",
    TRUE ~ "250+"
  )) 

mean_hours_per_hl_group <- df %>%
  group_by(hl_group, age_group) %>%
  summarize(age_group, mean_ptms_group = mean(norm_counts)) %>% distinct()
Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
  always returns an ungrouped data frame and adjust accordingly.
`summarise()` has grouped output by 'hl_group', 'age_group'. You can override
using the `.groups` argument.
mod_group_colours <- c('nuc_old' = '#EA7317', 'cyto_old' = '#2364AA', 'nuc_young' = '#F9AE8B', 'cyto_young' = '#219EBC')
# Plot bar chart
ggplot(mean_hours_per_hl_group, aes(x = hl_group, y = mean_ptms_group, fill = age_group)) +
  geom_bar(stat = "identity", position = 'dodge') +
  scale_x_discrete(limits = c("0-50", "50-100", "100-150", "150-200", "200-250", "250+")) +
  scale_fill_manual(values = mod_group_colours, name = 'Key') +
  labs(x = "Half-lives (hours)",
       y = "Mean sum of normalised PTM counts") +
  theme_classic()

Phosphorylation

df_1 <- human_ptms_nuc_old_hl %>% mutate(age_group = 'nuc_old') %>% filter(unimod_id == 21)
df_2 <- human_ptms_nuc_young_hl %>% mutate(age_group = 'nuc_young') %>% filter(unimod_id == 21)
df_3 <- human_ptms_cyto_old_hl %>% mutate(age_group = 'cyto_old') %>% filter(unimod_id == 21)
df_4 <- human_ptms_cyto_young_hl %>% mutate(age_group = 'cyto_young') %>% filter(unimod_id == 21)

df <- rbind(df_1, df_2, df_3, df_4)
df <- df %>%
  mutate(hl_group = case_when(
    mean_hl_hours <= 50 ~ "0-50",
    mean_hl_hours <= 100 ~ "50-100",
    mean_hl_hours <= 150 ~ "100-150",
    mean_hl_hours <= 200 ~ "150-200",
    mean_hl_hours <= 250 ~ "200-250",
    TRUE ~ "250+"
  )) 

mean_hours_per_hl_group <- df %>%
  group_by(hl_group, age_group) %>%
  summarize(age_group, mean_ptms_group = mean(norm_counts)) %>% distinct()
Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
  always returns an ungrouped data frame and adjust accordingly.
`summarise()` has grouped output by 'hl_group', 'age_group'. You can override
using the `.groups` argument.
mod_group_colours <- c('nuc_old' = '#EA7317', 'cyto_old' = '#2364AA', 'nuc_young' = '#F9AE8B', 'cyto_young' = '#219EBC')
# Plot bar chart
ggplot(mean_hours_per_hl_group, aes(x = hl_group, y = mean_ptms_group, fill = age_group)) +
  geom_bar(stat = "identity", position = 'dodge') +
  scale_x_discrete(limits = c("0-50", "50-100", "100-150", "150-200", "200-250", "250+")) +
  scale_fill_manual(values = mod_group_colours, name = 'Key') +
  labs(x = "Half-lives (hours)",
       y = "Mean sum of normalised PTM counts") +
  theme_classic()

Look at oxPTMs

df_1 <- human_ptms_nuc_old_hl %>% mutate(age_group = 'nuc_old') %>% filter(unimod_id %in% oxPTMs$ID)
df_2 <- human_ptms_nuc_young_hl %>% mutate(age_group = 'nuc_young') %>% filter(unimod_id %in% oxPTMs$ID)
df_3 <- human_ptms_cyto_old_hl %>% mutate(age_group = 'cyto_old') %>% filter(unimod_id %in% oxPTMs$ID)
df_4 <- human_ptms_cyto_young_hl %>% mutate(age_group = 'cyto_young') %>% filter(unimod_id %in% oxPTMs$ID)

df <- rbind(df_1, df_2, df_3, df_4)
df <- df %>%
  mutate(hl_group = case_when(
    mean_hl_hours <= 50 ~ "0-50",
    mean_hl_hours <= 100 ~ "50-100",
    mean_hl_hours <= 150 ~ "100-150",
    mean_hl_hours <= 200 ~ "150-200",
    mean_hl_hours <= 250 ~ "200-250",
    TRUE ~ "250+"
  )) 

mean_hours_per_hl_group <- df %>%
  group_by(hl_group, age_group) %>%
  summarize(age_group, mean_ptms_group = mean(norm_counts)) %>% distinct()
Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
  always returns an ungrouped data frame and adjust accordingly.
`summarise()` has grouped output by 'hl_group', 'age_group'. You can override
using the `.groups` argument.
mod_group_colours <- c('nuc_old' = '#EA7317', 'cyto_old' = '#2364AA', 'nuc_young' = '#F9AE8B', 'cyto_young' = '#219EBC')
# Plot bar chart
ggplot(mean_hours_per_hl_group, aes(x = hl_group, y = mean_ptms_group, fill = age_group)) +
  geom_bar(stat = "identity", position = 'dodge') +
  scale_x_discrete(limits = c("0-50", "50-100", "100-150", "150-200", "200-250", "250+")) +
  scale_fill_manual(values = mod_group_colours, name = 'Key') +
  labs(x = "Half-lives (hours)",
       y = "Mean sum of normalised PTM counts") +
  theme_classic()